Pose Compensation for Bimodal Speech Recognition
نویسندگان
چکیده
Lip reading has been proven to improve speech recognition accuracy in adverse environments. Most existing lip reading systems have frontal pose assumption, which makes it very difficult to use in tasks such as video transcription (speech recognition of the audio stream for video indexing and retrieval). In this paper, we propose a new method to compensate the lip pose change by exploiting the general symmetry of human face. From the imaging geometry we show that a frontal lip can be recovered from only one profile view. The resulting pose compensation method has the following advantages: (1) it only requires one profile image; (2) it does not need any 3D model; (3) it does not need an accurate lip shape contour. Experimental results are given to show the effectiveness of our method.
منابع مشابه
Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملBimodal speech recognition using coupled hidden Markov models
In this paper we present a bimodal speech recognition system in which the audio and visual modalities are modeled and integrated using coupled hidden Markov models (CHMMs). CHMMs are probabilistic inference graphs that have hidden Markov models as sub-graphs. Chains in the corresponding inference graph are coupled through matrices of conditional probabilities modeling temporal influences betwee...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملCENSREC-AV: evaluation frameworks for audio-visual speech recognition
This paper introduces incoming evaluation frameworks for bimodal speech recognition in noisy conditions and real environments. In order to develop a robust speech recognition in noisy environments, bimodal speech recognition which uses acoustic and visual information has been paid attention to particularly for this decade. As a lot of methods and techniques for bimodal speech recognition have b...
متن کاملImproved Bimodal Speech Recognition Study Based on Product Hidden Markov Model
Recent years have been higher demands for automatic speech recognition (ASR) systems that are able to operate robustly in an acoustically noisy environment. This paper proposes an improved product hidden markov model (HMM) used for bimodal speech recognition. A two-dimensional training model is built based on dependently trained audio-HMM and visual-HMM, reflecting the asynchronous characterist...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999